Comparing linguistic interpretation schemes for English corpora

نویسندگان

Eric ATWELL

George DEMETRIOU

John HUGHES

Amanda SCHIFFRIN

Clive SOUTER

Sean WILCOCK

چکیده

Project AMALGAM explored a range of Partof-Speech tagsets and phrase structure parsing schemes used in modern English corpus-based research. The PoS-tagging schemes and parsing schemes include some which have been used for hand annotation of corpora or manual postediting of automatic taggers or parsers; and others which are unedited output of a parsing program. Project deliverables include: a detailed description of each PoS-tagging scheme, and multi-tagged corpus; a “Corpus-neutral” tokenization scheme; a family of PoS-taggers, for 8 PoS-tagsets; a method for “PoS-tagset conversion”, a sample of texts parsed according to a range of parsing schemes: a MultiTreebank; an Internet service allowing researchers worldwide free access to the above resources, including a simple email-based method for PoS-tagging any English text with any or all PoS-tagset(s). We conclude that the range of tagging and parsing schemes in use is too varied to allow agreement on a standard; and that parserevaluation based on ‘bracket-matching’ is unfair to more sophisticated parsers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles

Epistemic modality devices are believed to be one of the prominent characteristics of research articles as the commonly used genre among the academic community members. Considering the importance of such devices in producing and comprehending scientific discourse, this study aimed to cross–culturally and cross-linguistically investigate epistemic modality markers as an important subcategory...

متن کامل

Detecting Abstract Linguistic Properties Through the Study of Corpus Data

For obvious reasons, the focus of much corpus linguistic research is on the surface word forms and strings that are available in all electronic corpora. As linguists, however, we are aware that language has structure which is not directly audible/visible on the surface. In order to study that invisible structure more effectively, we have been creating, in collaboration with others, a range of a...

متن کامل

Generating Conceptual Metaphors from Proposition Stores

Contemporary research on computational processing of linguistic metaphors is divided into two main branches: metaphor recognition and metaphor interpretation. We take a different line of research and present an automated method for generating conceptual metaphors from linguistic data. Given the generated conceptual metaphors, we find corresponding linguistic metaphors in corpora. In this paper,...

متن کامل

Application of a Corpus to Identify Gaps between English Learners and Native Speakers

In order to develop effective computerassisted language teaching systems for learners of English as a foreign language, it is first necessary to identify gaps between learners and native speakers in the four basic linguistic skills (reading, writing, pronunciation, and listening). To identify these gaps, the accuracy and fluency in language use between learners and native speakers should be com...

متن کامل

Inferring language change from computer corpora: Some methodological problems1

As the number and size of computer corpora grow, linguistic researchers are increasingly using them to study changes in language over time. Comparing usage at one point in time with usage at a later or an earlier period seems a stunningly simple and Sausurreanly impeccable method of studying language change. Needless to say the reality is rather different. This paper identifies some of the meth...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Comparing linguistic interpretation schemes for English corpora

نویسندگان

چکیده

منابع مشابه

A Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles

Detecting Abstract Linguistic Properties Through the Study of Corpus Data

Generating Conceptual Metaphors from Proposition Stores

Application of a Corpus to Identify Gaps between English Learners and Native Speakers

Inferring language change from computer corpora: Some methodological problems1

عنوان ژورنال:

اشتراک گذاری